5 research outputs found

    Extracting scientific results from research articles

    Get PDF
    International audienceThe basic unit of information of use by researchers in theoretical fields are the mathematical results. We aim to build a knowledge base of these results, using information extraction techniques on scholarly documents. We present an algorithm which extracts mathematical results and references to mathematical results from scientific papers, using their PDF or LATEX sources. We analyse the results of our algorithm on the whole arXiv database of scientific papers and explore the resulting graph of mathematical results, which contains more than 6 million results and 4.5 million edges. We present attempts to link theorems of different papers using a TFIDF vectorizer or an autoencoder

    Towards Extraction of Theorems and Proofs in Scholarly Articles

    No full text
    International audienceScholarly articles in mathematical fields often feature mathematical statements (theorems, propositions, etc.) and their proofs. In this paper, we present preliminary work for extracting such information from PDF documents, with several types of approaches: vision (using YOLO), natural language (with transformers), and styling information (with linear conditional random fields). Our main task is to identify which parts of the paper to label as theorem-like environments and proofs. We rely on a dataset collected from arXiv, with LaTeX sources of research articles used to train the models

    Referate

    No full text
    corecore